NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Automatic Tracing in Task-Based Runtime Systems

https://doi.org/10.1145/3669940.3707237

Yadav, Rohan; Bauer, Michael; Broman, David; Garland, Michael; Aiken, Alex; Kjolstad, Fredrik (March 2025, ACM)

Free, publicly-accessible full text available March 30, 2026
Composing Distributed Computations Through Task and Kernel Fusion

https://doi.org/10.1145/3669940.3707216

Yadav, Rohan; Sundram, Shiv; Lee, Wonchan; Garland, Michael; Bauer, Michael; Aiken, Alex; Kjolstad, Fredrik (March 2025, ACM)

Free, publicly-accessible full text available March 30, 2026
Compilation of Shape Operators on Sparse Arrays

https://doi.org/10.1145/3689752

Root, Alexander J; Yan, Bobby; Liu, Peiming; Gyurgyik, Christophe; Bik, Aart JC; Kjolstad, Fredrik (October 2024, Proceedings of the ACM on Programming Languages)

We show how to build a compiler for a sparse array language that supports shape operators such as reshaping or concatenating arrays, in addition to compute operators. Existing sparse array programming systems implement generic shape operators for only some sparse data structures, reduce shape operators on other data structures to those, and do not support fusion. Our system compiles sparse array expressions to code that efficiently iterates over reshaped views of irregular sparse data structures, without needing to materialize temporary storage for intermediates. Our evaluation shows that our approach generates sparse array code competitive with popular sparse array libraries: our generated shape operators achieve geometric mean speed-ups of 1.66×–15.3× when compared to hand-written kernels in scipy.sparse and 1.67×–651× when compared to generic implementations in pydata/sparse. For operators that require data structure conversions in these libraries, our generated code achieves geometric mean speed-ups of 7.29×–13.0× when compared to scipy.sparse and 21.3×–511× when compared to pydata/sparse. Finally, our evaluation demonstrates that fusing shape and compute operators improves the performance of several expressions by geometric mean speed-ups of 1.22×–2.23×.
more » « less
Full Text Available
Compiling Recurrences over Dense and Sparse Arrays

https://doi.org/10.1145/3649820

Sundram, Shiv; Tariq, Muhammad Usman; Kjolstad, Fredrik (April 2024, Proceedings of the ACM on Programming Languages)

We present a framework for compiling recurrence equations into native code. In our framework, users specify a system of recurrences, the types of data structures that store inputs and outputs, and scheduling commands for optimization. Our compiler then lowers these specifications into native code that respects the dependencies in the recurrence equations. Our compiler can generate code over both sparse and dense data structures, and determines if the recurrence system is solvable with the provided scheduling primitives. We evaluate the performance and correctness of the generated code on several recurrences, from domains as diverse as dense and sparse matrix solvers, dynamic programming, graph problems, and sparse tensor algebra. We demonstrate that the generated code has competitive performance to hand-optimized implementations in libraries. However, these handwritten libraries target specific recurrences, specific data structures, and specific optimizations. Our system, on the other hand, automatically generates implementations from recurrences, data formats, and schedules, giving our system more generality than library approaches.
more » « less
Full Text Available
Indexed Streams: A Formal Intermediate Representation for Fused Contraction Programs

https://doi.org/10.1145/3591268

Kovach, Scott; Kolichala, Praneeth; Gu, Tiancheng; Kjolstad, Fredrik (June 2023, Proceedings of the ACM on Programming Languages)

We introduce indexed streams, a formal operational model and intermediate representation that describes the fused execution of a contraction language that encompasses both sparse tensor algebra and relational algebra. We prove that the indexed stream model is correct with respect to a functional semantics. We also develop a compiler for contraction expressions that uses indexed streams as an intermediate representation. The compiler is only 540 lines of code, but we show that its performance can match both the TACO compiler for sparse tensor algebra and the SQLite and DuckDB query processing libraries for relational algebra.
more » « less
Full Text Available
Legate Sparse: Distributed Sparse Computing in Python

https://doi.org/10.1145/3581784.3607033

Yadav, Rohan; Lee, Wonchan; Elibol, Melih; Papadakis, Manolis; Lee-Patti, Taylor; Garland, Michael; Aiken, Alex; Kjolstad, Fredrik; Bauer, Michael (November 2023, ACM)

Full Text Available
Onyx: A 12nm 756 GOPS/W Coarse-Grained Reconfigurable Array for Accelerating Dense and Sparse Applications

https://doi.org/10.1109/VLSITechnologyandCir46783.2024.10631383

Koul, Kalhan; Strange, Maxwell; Melchert, Jackson; Carsello, Alex; Mei, Yuchen; Hsu, Olivia; Kong, Taeyoung; Chen, Po-Han; Ke, Huifeng; Zhang, Keyi; et al (June 2024, IEEE)

Full Text Available
SpDISTAL: Compiling Distributed Sparse Tensor Computations

https://doi.org/10.1109/SC41404.2022.00064

Yadav, Rohan; Aiken, Alex; Kjolstad, Fredrik (November 2022, IEEE/ACM)

We introduce SpDISTAL, a compiler for sparse tensor algebra that targets distributed systems. SpDISTAL combines separate descriptions of tensor algebra expressions, sparse data structures, data distribution, and computation distribution. Thus, it enables distributed execution of sparse tensor algebra expressions with a wide variety of sparse data structures and data distributions. SpDISTAL is implemented as a C++ library that targets a distributed task-based runtime system and can generate code for nodes with both multi-core CPUs and multiple GPUs. SpDISTAL generates distributed code that achieves performance competitive with hand-written distributed functions for specific sparse tensor algebra expressions and that outperforms general interpretation-based systems by one to two orders of magnitude.
more » « less
Full Text Available
Mosaic: An Interoperable Compiler for Tensor Algebra

https://doi.org/10.1145/3591236

Bansal, Manya; Hsu, Olivia; Olukotun, Kunle; Kjolstad, Fredrik (June 2023, Proceedings of the ACM on Programming Languages)

We introduce Mosaic, a sparse tensor algebra compiler that can bind tensor expressions to external functions of other tensor algebra libraries and compilers. Users can extend Mosaic by adding new functions and bind a sub-expression to a function using a scheduling API. Mosaic substitutes the bound sub-expressions with calls to the external functions and automatically generates the remaining code using a default code generator. As the generated code is fused by default, users can productively leverage both fusion and calls to specialized functions within the same compiler. We demonstrate the benefits of our dual approach by showing that calling hand-written CPU and specialized hardware functions can provide speedups of up to 206× against fused code in some cases, while generating fused code can provide speedups of up to 3.57× against code that calls external functions in other cases. Mosaic also offers a search system that can automatically map an expression to a set of registered external functions. Both the explicit binding and automatic search are verified by Mosaic. Additionally, the interface for adding new external functions is simple and general. Currently, 38 external functions have been added to Mosaic, with each addition averaging 20 lines of code.
more » « less

Search for: All records